Search CORE

12 research outputs found

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Author: Rasnayake Lahiru
Sjalander Magnus
Umuroglu Yaman
Publication venue
Publication date: 01/01/2018
Field of study

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. We present BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing. BISMO utilizes the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We characterize the resource usage and performance of BISMO across a range of parameters to build a hardware cost model, and demonstrate a peak performance of 6.5 TOPS on the Xilinx PYNQ-Z1 board.Comment: To appear at FPL'1

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

Author: Conficconi Davide
Preusser Thomas B.
Rasnayake Lahiru
Sjalander Magnus
Umuroglu Yaman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed-point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. BISMO, a vectorized bit-serial matrix multiplication overlay for reconfigurable computing, previously utilized the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We show how BISMO can be scaled up on Xilinx FPGAs using an arithmetic architecture that better utilizes 6-LUTs. The improved BISMO achieves a peak performance of 15.4 binary TOPS on the Ultra96 board with a Xilinx UltraScale+ MPSoC.Comment: Invited paper at ACM TRETS as extension of FPL'18 paper arXiv:1806.0886

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

NORA - Norwegian Open Research Archives

RePAiR: A Strategy for Reducing Peak Temperature while Maximising Accuracy of Approximate Real-Time Computing: Work-in-Progress

Author: Chakraborty Shounak
McDonald-Maier Klaus
Saha Sangeet
Sjalander Magnus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2020
Field of study

Improving accuracy in approximate real-time computing without violating thermal-energy constraints of the underlying hardware is a challenging problem. The execution of approximate real-time tasks can individually be bifurcated into two components: (i) execution of the mandatory part of the task to obtain a result of acceptable quality, followed by (ii) partial/complete execution of the optional part, which refines the initially obtained result, to increase the accuracy without violating the temporal-deadline. This paper introduces RePAiR, a novel task-allocation strategy for approximate real-time applications, combined with fine-grained DVFS and on-line task migration of the cores and power-gating of the last level cache, to reduce chip-temperature while respecting both deadline and thermal constraints. Furthermore, gained thermal benefits can be traded against system-level accuracy by extending the execution-time of the optional part

University of Essex Research Repository

Crossref

Delay-on-Squash : Stopping Microarchitectural Replay Attacks in Their Tracks

Author: Kaxiras Stefanos
Sakalis Christos
Sjalander Magnus
Publication venue
Publication date: 01/01/2022
Field of study

MicroScope and other similar microarchitectural replay attacks take advantage of the characteristics of speculative execution to trap the execution of the victim application in a loop, enabling the attacker to amplify a side-channel attack by executing it indefinitely. Due to the nature of the replay, it can be used to effectively attack software that are shielded against replay, even under conditions where a side-channel attack would not be possible (e.g., in secure enclaves). At the same time, unlike speculative side-channel attacks, microarchitectural replay attacks can be used to amplify the correct path of execution, rendering many existing speculative side-channel defenses ineffective. In this work, we generalize microarchitectural replay attacks beyond MicroScope and present an efficient defense against them. We make the observation that such attacks rely on repeated squashes of so-called "replay handles" and that the instructions causing the side-channel must reside in the same reorder buffer window as the handles. We propose Delay-on-Squash, a hardware-only technique for tracking squashed instructions and preventing them from being replayed by speculative replay handles. Our evaluation shows that it is possible to achieve full security against microarchitectural replay attacks with very modest hardware requirements while still maintaining 97% of the insecure baseline performance

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Twig: Multi-agent task management for colocated latency-critical cloud services

Author: Carpenter Paul
Nishtala Rajiv
Petrucci Vinicius
Sjalander Magnus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Many of the important services running on data centres are latency-critical, time-varying, and demand strict usersatisfaction. Stringent tail-latency targets for colocated services and increasing system complexity make it challenging to reduce the power consumption of data centres. Data centres typically sacrifice server efficiency to maintain tail-latency targets resulting in an increased total cost of ownership. This paper introduces Twig, a scalable quality-of-service (QoS) aware task manager for latency-critical services co-located on a server system. Twig successfully leverages deep reinforcement learning to characterise tail latency using hardware performance counters and to drive energy-efficient task management decisions in data centres. We evaluate Twig on a typical data centre server managing four widely used latency-critical services. Our results show that Twig outperforms prior works in reducing energy usage by up to 38% while achieving up to 99% QoS guarantee for latency-critical services.This work was funded by the European Union under grant agreement No 754337 (EuroEXA), the Brazilian federal government under CNPq grant (Process no 430188/2018-8), and the Swedish Research Council under grant 2015-05159. The 11 experiments were conducted on the NTNU EPIC computing infrastructure and support by NTNU’s HPC group.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

NORA - Norwegian Open Research Archives

Understanding Selective Delay as a Method for Efficient Secure Speculative Execution

Author: Alberto Ros
Alexandra Jimborean
Christos Sakalis
Magnus Sjalander
Stefanos Kaxiras
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

ARCTIC: Approximate Real-Time Computing in a Cache-Conscious Multicore Environment

Author: Agarwal Sukarn
McDonald-Maier Klaus
Saha Sangeet
Shounak Chakraborty
Sjalander Magnus
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 10/04/2024
Field of study

Improving result-accuracy in approximate computing (AC) based time-critical systems, without violating power constraints of the underlying circuitry, is gradually becoming challenging with the rapid progress in technology scaling. The execution span of each AC real-time tasks can be split into a couple of parts: (i) the mandatory part, execution of which offers a result of acceptable quality, followed by (ii) the optional part, which can be executed partially or completely to refine the initially obtained result in order to increase the result-accuracy, while respecting the time-constraint. In this article, we introduce a novel hybrid offline-online scheduling strategy, for AC real-time tasks. The goal of real-time scheduler of is to maximise the results-accuracy (QoS) of the task-set with opportunistic shedding of the optional part, while respecting system-wide constraints. During execution, retains exclusive copy of the private cache blocks only in the local caches in a multi-core system and no copies of these blocks are maintained at the other caches, and improves performance (i.e., reduces execution-time) by accumulating more live blocks on-chip. Combining offline scheduling with the online cache optimization improves both QoS and energy efficiency. While surpassing prior arts, our proposed strategy reduces the task-rejection-rate by up to 25%, whereas enhances QoS by 10%, with an average energy-delay-product gain of up to 9.1%, on an 8-core system

University of Essex Research Repository

Logic Circuit

Author: Gonzalez-Zalba Fernando
Kaxiras Stefanos
Klymenko Mikhailo
Levine Raphaël David
Remacle Françoise
Sjalander Magnus
Publication venue: 'Radiology Case Reports, University of Washington'
Publication date: 05/05/2016
Field of study

publication date: 2017-12-12; filing date: 2016-05-0

Open Repository and Bibliography - Liège

Static Instruction Scheduling for High Performance on Limited Hardware

Author: Alexandra Jimborean
Kim-Anh Tran
Konstantinos Koukos
Magnus Sjalander
Stefanos Kaxiras
Trevor E. Carlson
Vasileios Spiliopoulos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Increased Prevalence of Prior Malignancies and Autoimmune Diseases in Patients Diagnosed with Chronic Myeloid Leukemia

Author: Anders Sjalander
Arta Dreimane
Berit Markevarn
Fredrik Sandin
Hans Wadenvik
Johan Richter
Leif Stenke
Magnus Bjorkholm
Martin Hoglund
Mats Lambe
Niklas Gunnarsson
Solveig Wallberg Jonsson
Ulla Olsson-Stromberg
Publication venue: 'American Society of Hematology'
Publication date
Field of study

Crossref